-
-
Notifications
You must be signed in to change notification settings - Fork 195
Fix 403 when downloading data for mnist tutorial #66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The caching of the |
LGTM, and will improve CI performance. Just one nit about educating people to spoof headers responsibly. |
Unfortunately this is not working for me - I get files that apparently are not in gzip format. Maybe this is why the CI is still failing too? |
It looks like the CI failure was just related to execution timeout again (not related to downloading). The build artifact makes it seem like the cache is working properly (there would be print statements like "Downloading xyz" at the code cell if the data were being downloaded again). The failure due to gzip is indeed strange - the cached files are likely compressed otherwise execution would be failing on the decompression step too, though the docs for iter_content do say that compressed files will be automatically be decompressed. I'm not sure that behavior is always consistent, otherwise I don't understand how the cache was originally built with the still-compressed files. Some possible solutions that come to mind are:
I'm partial to the second option - it adds a minor amount of boilerplate but should work regardless whether the data was decompressed when it was downloaded. |
Ah - another potential problem is that the download is simply failing :). Testing locally now I'm getting the 503 errors (instead of the 403 forbidden from before). There currently isn't a check of the response status so that should be updated as well. Ultimately I think this PR puts the necessary infrastructure in place to cache the data, but we may also need to change the data source as the original website does not seem very reliable. |
Sounds reasonable - thanks @rossbar ! |
Thank you @rossbar Note / ToDo from the meeting: Adding a warning sign cc @melissawm |
Uses header spoofing to circumvent problems with downloading the MNIST digit data. Also adds data caching to the circleCI builds so in principle, the MNIST data will be cached between CI runs.
Closes #63